Skip to content

fix: Correct Elasticsearch docs for 10k row cap, object type mapping, and timeout#1662

Merged
lukekim merged 2 commits into
trunkfrom
fix/elasticsearch-10k-cap-and-type-mapping
May 6, 2026
Merged

fix: Correct Elasticsearch docs for 10k row cap, object type mapping, and timeout#1662
lukekim merged 2 commits into
trunkfrom
fix/elasticsearch-10k-cap-and-type-mapping

Conversation

@claudespice
Copy link
Copy Markdown
Collaborator

Summary

  • 10,000-row fetch cap: The Elasticsearch connector silently clamps all query results to 10,000 rows (the Elasticsearch default index.max_result_window), even when SQL LIMIT is higher or omitted. This was undocumented. Added this as a limitation in the connector index page and updated the deployment guide's "Result size" bullet to call out the hard cap.
  • object type mapping: The type mapping table incorrectly showed object fields as Utf8 (serialized JSON). In practice, object fields with sub-properties are flattened into dot-separated columns (e.g. address.city), which the prose below the table already stated. Split the table row to distinguish object (with sub-fields) from object (no sub-fields) and nested.
  • Timeout description: The request timeout row said "including retries", but the data connector's read path does not retry — only the vector engine's bulk_index path retries. Changed to "Maximum time for each individual HTTP request."

Test plan

  • Verify the connector index page renders correctly with the split type mapping rows
  • Confirm the new limitation bullet appears in the Limitations section
  • Confirm the deployment guide timeout table and capacity section reflect the updated text

The metric types for `replication_bootstrap_rows_total` and
`replication_bootstrap_complete` were swapped. Per the source
implementation, `replication_bootstrap_rows_total` is an
ObservableCounter (not ObservableGauge) and
`replication_bootstrap_complete` is an ObservableGauge (not
ObservableCounter).
…e mapping, and timeout description

- Document the 10,000-row hard cap on query results in both the
  connector index and deployment guide. The connector clamps the
  Elasticsearch `size` parameter to 10,000 regardless of SQL LIMIT.
- Split the `object`/`nested` type mapping table row: object fields
  with sub-properties are flattened into dot-separated columns, not
  serialized as JSON. Only object fields without sub-fields and
  `nested` fields become Utf8 JSON strings.
- Fix the request timeout description: the 30s timeout applies per
  individual HTTP request, not "including retries" (read operations
  do not retry).
@claudespice claudespice added bug Something isn't working area/docs labels May 6, 2026
@claudespice claudespice self-assigned this May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🔍 Pull with Spice Failed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
  • ✅ No banned labels detected

Failed checks:

  • ❌ At least one assignee is required for this pull request.

Please address these issues and update your pull request.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🚀 deployed to https://70e18963.spiceai-org-website.pages.dev

@lukekim lukekim merged commit 21b1f40 into trunk May 6, 2026
6 of 9 checks passed
@lukekim lukekim deleted the fix/elasticsearch-10k-cap-and-type-mapping branch May 6, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants